Twitter API Design Evaluation and Latency Budget

Introduction#

After familiarizing ourselves with Twitter's services and their endpoints, we'll focus on the last two aspects of designing the API in this lesson, namely how to meet non-functional requirements and how to estimate the response time of our Twitter API. Moreover, we’ll also discuss some interesting scenarios related to timelines and try to optimize the service using different approaches.

Non-functional requirements#

Let's discuss how we can achieve the non-functional requirements of the API for Twitter services.

Availability#

We ensure the availability of the system even during unexpected spikes (for instance, a celebrity’s Tweet needs to be delivered to millions of followers in a timely manner) by having loosely coupled services run separate tasks concurrently and statelessly. An example of such loosely coupled services is the usage of the pub-sub service between the Tweet service and the timeline service. The pub-sub service decouples our two main services and queues multiple concurrent Tweets during peak hours. Furthermore, we use a monitoring system that helps us detect anomalies, such as overloading the service due to excessive requests. To prevent excessive requests, rate limiting helps us reduce network traffic by restricting users' access to the Twitter API for a certain period of time. For example, the user can post a maximum of 15 Tweets per minute.

Reliability #

We use circuit breakers to identify and recover from bad situations as quickly as possible for our services. Also, we eliminate the single point of failure by routing the request to any available replica service. Furthermore, we use the backend for the frontend (BFF) approach for our API gateway to make it reliable and available because our services are used by different clients (mobile and website). For example, if the Twitter website is down (which is very rare), the service of the mobile application will not be affected by this downtime because the BFF handles each frontend or client type independently.

Scalability #

The stateless nature of the HTTP allows us to divide the load of incoming requests across multiple servers. Next, we use a mixture of push and pull models for the timeline updates, depending on user type (active or inactive), to serve a large number of users.

Security #

Twitter's service authenticates and authorizes users by using their credentials. The credentials must be encrypted through HTTPS/TLS. Typical users can authenticate themselves using the Authorization: Basic <encodedCreds> header. Moreover, we adopt the OAuth/OIDC code authentication with the PKCE mechanism for third-party logins (such as through Google, Apple, etc.).

Low latency#

For systems like Twitter, where the number of users and data increase daily, we maintain as low a latency as possible by pre-generating the timelines and storing them in the feed cache to serve the active users. Since the system is read-heavy and users are likely to read the most recent Tweets, it makes sense to keep them in the cache. Furthermore, we adopt cursor pagination, which] helps us to paginate the Tweets efficiently using the next-cursor pointer to avoid unnecessary load on the network, client, or server side. Also, links to popular data (such as images, videos, scripts, and so on) are generated dynamically, which directs the user to the nearest CDN. The client can download popular media content from the CDNs, which allows us to improve the response time. The following illustration shows how users can retrieve trends or Tweets related to them from the nearest CDNs instead of fetching them from the origin server.

Created with Fabric.js 3.6.6
The origin trend service stores the copy of popular trends and Tweets related to the user on the regionally distributed CDNs

1 of 3

Created with Fabric.js 3.6.6
If the user wants to see the Tweets related to popular trends, then the user request is routed to the nearest CDN

2 of 3

Created with Fabric.js 3.6.6
The CDNs return the Tweets related to specific trends to the users instead of routing their requests to the origin server

3 of 3

Achieving Non-Functional Requirements

Non-Functional Requirements

Approaches


Availability

  • Use API monitoring tools to detect any quirky behavior
  • Use rate limiting to limit the excessive incoming requests
  • Use loosely coupled services and the stateless nature of requests
  • Use caching, such as feed cache


Reliability

  • Eliminate the single point of failure by routing the request to any available service
  • Use the backend for frontend pattern for different client types
  • Use a circuit breaker

Scalability

  • Provide timeline updates to a large number of users using a hybrid push-pull model
  • Route the request to any available replica service


Security

  • Carry out basic authentication via user credentials
  • Authenticate and authorize via the OAuth/OIDC token with the PKCE mechanism for third-party access
  • Use TLS for secure exchange of data


Low latency

  • Pregenerate of timeline for active users
  • Use cache to store the recent Tweets and timelines for active users
  • Paginate so as to allow data to be delivered in intervals
  • Use CDNs to get data swiftly

Latency budget#

This section calculates the response time for our Tweet and timeline APIs. We will start by estimating the sizes of the request and response messages for each service. Next, we will head on to calculate the response times of each service.

Note: As discussed in the “Back-of-the-Envelope Latency Calculations” chapter, the averageRTTRTT remains the same in the case of GET, regardless of the data size (due to the small request size), and the time to download the response varies by0.4 ms0.4\ ms per KB. Similarly, for POST requests, theRTTRTT time changes with the data size by1.15 ms1.15\ ms per KB after the base RTT time, which was 260 ms.

Tweet API#

The Tweet API stores a Tweet and returns the created Tweet attributes in response. We assume that our Tweet API allows us to post the following type of content in Tweets, as mentioned earlier:

  1. Text, mentions, or hashtags—the length of which cannot be more than 280 characters

  2. Up to 4 images

  3. One video with a max size of 10 MBs

Content types 2 and 3 require first uploading the media files and then using only the mediaID in the request to post a Tweet. In order to efficiently utilize the network bandwidth, the returned response for these contains only the media links.

Request and response size#

Since the POST request will contain both media and Tweet text, we will estimate the sizes of each as follows:

  • Media size and response time:  Since we have already estimated the response time of files in the file upload API, we will use the same procedure for the media file response time. Let us assume a user wants to post a Tweet that includes three images with some text. As we have seen above, we send a preflight request for uploading any media file prior to posting the Tweet along with the mediaID. Let's estimate the time taken to upload the media files. For this, assume each image file is 175 KB which takes around 414–560 ms  (estimated using the file upload API calculator) to upload. For the plausibility of our service, we will assume the maximum possible latency, which, in this case, is 560 ms.

  • Tweet size: For posting a Tweet, the request headers take 1 KB and the request body takes 1.12 KBs because the request body comprises multiple data entities, such as tweetID, userID, content, mediaID, and other fields. The final request size is the summation of the request header and body, as shown below:

Request size=1 +1.12=2.12 KBs Request \space size = 1 \space + 1.12 = 2.12\ KBs

The response consists of the header and body sizes. We take the header size of 1 KB, and the response body size is 3 KB. Normally, the POST request takes a smaller response size than the request size. However, in our case, we're getting a response size slightly larger than the request size because we're getting the posted Tweet with other data entities, such as userID, tweetID, and information related to the mentioned profiles, hashtags, media, etc., in a response body.

Response size=1 KB+3 KB=4 KB Response \space size = 1 \space KB + 3 \space KB = 4 \space KB

Response time#

We calculated the message size for the Tweet request. Now, it's time to estimate the response time of our Tweet API. We take some values (say, the base time, processing time, and RTT) from the “Back-of-the-Envelope Calculations for Latency” chapter.

We can use the following calculator by entering the request and response sizes to get the Tweet API's minimum and maximum response.

Response time calculator for Tweet API

Enter request size in KBs2.12KB
Enter response size in KBs4KB
Minimum latencyf384.538ms
Maximum latencyf465.538ms
Minimum response timef388.538ms
Maximum response timef469.538ms
Minimum media file upload time414ms
Maximum media file upload time560ms
Minimum user-perceived response timef802.538ms
Maximum user-perceived response timef1029.538ms

Assuming that the request size is 2.12 KBs and the response size is 4 KBs:

Timelatency=Timebase+RTTpost+TimeDownloadTime_{latency} = Time_{base} + RTT_{post} + Time_{Download}

RTTpost=RTTbase+1.15×request sizeRTT_{post} = RTT_{base}+1.15\times request\ size

TimeDownload=0.4×response sizeTime_{Download} = 0.4\times response\ size

Timelatency_min=Timebase_min+RTTbase+(1.15×request size)+(0.4×response size)Time_{latency\_min} = Time_{base\_min} + RTT_{base} + (1.15\times request\ size) + (0.4\times response\ size)

=120.5+(260+1.15×2.12)+(0.4×4)=384.538 ms= 120.5 + (260 + 1.15 \times 2.12) + (0.4\times 4) = 384.538\ ms

Timelatency_max=Timebase_max+RTTbase+(1.15×request size)+(0.4×response size)Time_{latency\_max} = Time_{base\_max} + RTT_{base} + (1.15\times request\ size) + (0.4\times response\ size)

=201.5+(260+1.25×2)+(0.4×4)=465.538 ms= 201.5 + (260 + 1.25 \times 2) + (0.4\times 4) = 465.538\ ms

Similarly, the response time is calculated as follows:

Responsemin=Latencymin+Timeprocessing_min=384.538 ms+4 ms=388.538 msResponse_{min} = Latency_{min} + Time_{processing\_min}= 384.538\ ms + 4\ ms = 388.538\ ms

Responsemax=Latencymax+Timeprocessing_max=465.538 ms+4 ms=469.538 msResponse_{max} = Latency_{max} + Time_{processing\_max}= 465.538\ ms + 4\ ms = 469.538\ ms

The user-perceived response time for uploading three images with text takes roughly 1,029.538 ms, as shown below:

Timetotal_min=Timemedia_upload_min+ResponseminTime_{total\_min} = Time_{media\_upload\_min} + Response_{min}

Timetotal_min=414+388.538=802.538 msTime_{total\_min} = 414 + 388.538 = 802.538\ ms

Timetotal_max=Timemedia_upload_max+ResponsemaxTime_{total\_max} = Time_{media\_upload\_max} + Response_{max}

Timetotal_max=560+469.538=1029.538 msTime_{total\_max} = 560 + 469.538 = 1029.538\ ms

A summary of the latency budget for a request for the Tweet service is shown in the illustration below:

Latency and processing time of the Tweet service
Latency and processing time of the Tweet service

Let's now move to the timeline service response time.

Timeline API#

The Tweet API returns only the recently posted Tweet in response. However, the timeline API returns the stream of Tweets with ads, people, and trends.

Request and response size #

We assumed that the request size for a GET request is 1.52 KB and the response size is 170 KB. The request consists of some essential headers and userId, at most. Let’s assume that the response body includes 80 Tweets, five trends, and two promoted ads. If the size of each Tweet is 2 KB and the size of the top five trends is 2.5 KB, whereas two ads and three user accounts are of sizes 5 KB and 1.5 KB, respectively, then the total size is equal to:

Response size=170 KB Response \space size = 170 \space KB

The size of the GET request is generally smaller than the response because the request body is empty. Therefore, we will only consider the response size to estimate the request's response time.

Response time#

Let's calculate the response time by setting the values of response size in the following calculator:

Response time calculator for the timeline API

Enter size in KBs170KB
Minimum latencyf258.5ms
Maximum latencyf339.5ms
Minimum response timef262.5ms
Maximum response timef408.5ms

Assuming that the response size is 170 KBs, then the latency is calculated by:

Timelatency_min=Timebase_min+RTTget+0.4×size of response (KBs)=120.5+70+0.4×170=258.5 msTime_{latency\_min} = Time_{base\_min} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 120.5 + 70 + 0.4 \times 170 = 258.5\ ms

Timelatency_max=Timebase_max+RTTget+0.4×size of response (KBs)=201.5+70+0.4×170=339.5 msTime_{latency\_max} = Time_{base\_max} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 201.5 + 70 + 0.4 \times 170 = 339.5\ ms

Similarly, the response time is calculated using the following equation:

TimeResponse=Timelatency+TimeprocessingTime_{Response} = Time_{latency}+ Time_{processing}

Now, for the minimum response time, we use the minimum values of base time and processing time:

TimeResponse_min=Timelatency_min+Timeprocessing_min=258.5 ms+4 ms=262.5 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 258.5\ ms + 4\ ms = 262.5\ ms

Now, for the maximum response time, we use the maximum values of base time and processing time:

TimeResponse_max=Timelatency_max+Timeprocessing_max=339.5 ms+69 ms=408.5 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 339.5\ ms + 69\ ms = 408.5\ ms

A summary of the latency budget for GET request for the timeline service is shown in the illustration below:

Latency and processing time of the timeline service
Latency and processing time of the timeline service

In this chapter, we designed services to post Tweets and view a timeline in the Twitter API. Additionally, we addressed the non-functional requirements of the service by employing the appropriate technologies. Finally, in the “Latency budget” section, we estimated that the service would respond in near real-time.

API Model for Twitter Service

Requirements of the Uber API